POS Tags and Decision Trees for Language Modeling
نویسنده
چکیده
Language models for speech recognition concentrate solely on recognizing the words that were spoken. In this paper, we advocate redefining the speech recognition problem so that its goal is to find both the best sequence of words and their POS tags, and thus incorporate POS tagging. To use POS tags effectively, we use clustering and decision tree algorithms, which allow generalizations between POS tags and words to be effectively used in estimating the probability distributions. We show that our POS model gives a reduction in word error rate and perplexity for the Trains corpus in comparison to word and class-based approaches. By using the Wall Street Journal corpus, we show that this approach scales up when more training data is available.
منابع مشابه
بررسی مقایسهای تأثیر برچسبزنی مقولات دستوری بر تجزیه در پردازش خودکار زبان فارسی
In this paper, the role of Part-of-Speech (POS) tagging for parsing in automatic processing of the Persian language is studied. To this end, the impact of the quality of POS tagging as well as the impact of the quantity of information available in the POS tags on parsing are studied. To reach the goals, three parsing scenarios are proposed and compared. In the first scenario, the parser assigns...
متن کاملEstimation of Conditional Probabilities With Decision Trees and an Application to Fine-Grained POS Tagging
We present a HMM part-of-speech tagging method which is particularly suited for POS tagsets with a large number of fine-grained tags. It is based on three ideas: (1) splitting of the POS tags into attribute vectors and decomposition of the contextual POS probabilities of the HMM into a product of attribute probabilities, (2) estimation of the contextual probabilities with decision trees, and (3...
متن کاملLinguistic Issues in Language Technology – LiLT
Parsing learner data poses a great challenge for standard tools, since non-canonical and unusual structures may lead to wrong interpretations on the part of the taggers and parsers. It is well known that providing a statistical parser with perfect part-of-speech (POS) tags is of great benefit for parsing accuracy, and that parsing results can decrease considerably when the parser has to predict...
متن کاملA Comparison of Three Machine Learning Methods for Amazigh POS Tagging
Part of speech tagging (POS tagging) has a crucial role in different fields of natural language processing (NLP) including Speech Recognition, Natural Language Parsing, Information Retrieval and Multi Words Term Extraction. This paper describes a set of experiments involving the application of three state-of the-art part-of-speech taggers to Amazigh texts, using a tagset of 28 tags. The taggers...
متن کاملPOS Tagging versus Classes in Language Modeling
Language models for speech recognition concentrate solely on recognizing the words that were spoken. In this paper, we advocate redefining the speech recognition problem so that its goal is to find both the best sequence of words and their POS tags, and thus incorporate POS tagging. The use of POS tags allows more sophisticated generalizations than are afforded by using a class-based approach. ...
متن کامل